NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Probres: Probabilistic jump diffusion for open-world egocentric activity recognition

Kundu, Sanjoy; Vellamcheti, Shanmukha; Aakur, Sathyanarayanan N (October 2025, IEEE International Conference on Computer Vision (ICCV), 2025)

Free, publicly-accessible full text available October 30, 2026
CRAFT: A Neuro-Symbolic Framework for Visual Functional Affordance Grounding

Chen, Zhou; Lin, Joe; Aakur, Sathyanarayanan N (August 2025, Proceedings of Machine Learning Research vol 284:1–10, 2025 19th Conference on Neurosymbolic Learning and Reasoning)

Free, publicly-accessible full text available August 29, 2026
Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning

Kundu, Sanjoy; Trehan, Shubham; Aakur, Sathyanarayanan N (November 2024, Springer)
Leonardis, Aleš; Ricci, Elisa; Roth, Stefan; Russakovsky, Olga; Sattler, Torsten; Varol, Gül (Ed.)
Learning to infer labels in an open world, i.e., in an environment where the target “labels” are unknown, is an important characteristic for achieving autonomy. Foundation models, pre-trained on enormous amounts of data, have shown remarkable generalization skills through prompting, particularly in zero-shot inference. However, their performance is restricted to the correctness of the target label’s search space, i.e., candidate labels provided in the prompt. This target search space can be unknown or exceptionally large in an open world, severely restricting their performance. To tackle this challenging problem, we propose a two-step, neuro-symbolic framework called ALGO - Action Learning with Grounded Object recognition that uses symbolic knowledge stored in large-scale knowledge bases to infer activities in egocentric videos with limited supervision. First, we propose a neuro-symbolic prompting approach that uses object-centric vision-language models as a noisy oracle to ground objects in the video through evidence-based reasoning. Second, driven by prior commonsense knowledge, we discover plausible activities through an energy-based symbolic pattern theory framework and learn to ground knowledge-based action (verb) concepts in the video. Extensive experiments on four publicly available datasets (EPIC-Kitchens, GTEA Gaze, GTEA Gaze Plus, and Charades-Ego) demonstrate its performance on open-world activity inference. ALGO can be extended to zero-shot inference and demonstrate its competitive performance.
more » « less
Full Text Available
EASE: Embodied Active Event Perception via Self-Supervised Energy Minimization

https://doi.org/10.1109/LRA.2025.3583626

Chen, Zhou; Kundu, Sanjoy; Baweja, Harsimran S; Aakur, Sathyanarayanan N (August 2025, IEEE Robotics and Automation Letters)

Free, publicly-accessible full text available August 1, 2026
Self-supervised Multi-actor Social Activity Understanding in Streaming Videos

Trehan, Shubham; Aakur, Sathyanarayanan N (September 2024, Springer)

This work addresses the problem of Social Activity Recognition (SAR), a critical component in real-world tasks like surveillance and assistive robotics. Unlike traditional event understanding approaches, SAR necessitates modeling individual actors' appearance and motions and contextualizing them within their social interactions. Traditional action localization methods fall short due to their single-actor, single-action assumption. Previous SAR research has relied heavily on densely annotated data, but privacy concerns limit their applicability in real-world settings. In this work, we propose a self-supervised approach based on multi-actor predictive learning for SAR in streaming videos. Using a visual-semantic graph structure, we model social interactions, enabling relational reasoning for robust performance with minimal labeled data. The proposed framework achieves competitive performance on standard group activity recognition benchmarks. Evaluation on three publicly available action localization benchmarks demonstrates its generalizability to arbitrary action localization.
more » « less
Full Text Available
Capturing Temporal Components for Time Series Classification

Vavilthota, Venkata Ragavendra; Ramanathan, Ranjith; Aakur, Sathyanarayanan N (September 2024, Springer)

Analyzing sequential data is crucial in many domains, particularly due to the abundance of data collected from the Internet of Things paradigm. Time series classification, the task of categorizing sequential data, has gained prominence, with machine learning approaches demonstrating remarkable performance on public benchmark datasets. However, progress has primarily been in designing architectures for learning representations from raw data at fixed (or ideal) time scales, which can fail to generalize to longer sequences. This work introduces a \textit{compositional representation learning} approach trained on statistically coherent components extracted from sequential data. Based on a multi-scale change space, an unsupervised approach is proposed to segment the sequential data into chunks with similar statistical properties. A sequence-based encoder model is trained in a multi-task setting to learn compositional representations from these temporal components for time series classification. We demonstrate its effectiveness through extensive experiments on publicly available time series classification benchmarks. Evaluating the coherence of segmented components shows its competitive performance on the unsupervised segmentation task.
more » « less
Full Text Available
Shape-Graph Matching Network (SGM-net): Registration for Statistical Shape Analysis

https://doi.org/10.1109/ISBI56570.2024.10635203

Liang, Shenyuan; Srivastava, Anuj; Segundo, Mauricio Pamplona; Sarkar, Sudeep; Aakur, Sathyanarayanan N (May 2024, IEEE)

This paper focuses on the registration problem of shape graphs, where a shape graph is a set of nodes connected by articulated curves with arbitrary shapes. This registration requires optimization over the permutation group, made challenging by differences in nodes (in terms of numbers, locations) and edges (in terms of shapes, placements, and sizes) across graphs. We tackle this registration problem using a neuralnetwork architecture with an unsupervised loss function based on the elastic shape metric for curves. This architecture results in (1) state-of-the-art matching performance and (2) an order of magnitude reduction in the computational cost relative to baseline approaches. We demonstrate the effectiveness of the proposed approach using both simulated data and real-world 2D retinal blood vessels and 3D microglia graphs.
more » « less
Full Text Available
ProtoKD: Learning from Extremely Scarce Data for Parasite Ova Recognition

https://doi.org/10.1109/ICMLA58977.2023.00100

Trehan, Shubham; Ramachandran, Udhav; Scimeca, Ruth; Aakur, Sathyanarayanan N (December 2023, IEEE)

Developing reliable computational frameworks for early parasite detection, particularly at the ova (or egg) stage, is crucial for advancing healthcare and effectively managing potential public health crises. While deep learning has significantly assisted human workers in various tasks, its application in diagnostics has been constrained by the need for extensive datasets. The ability to learn from an extremely scarce training dataset, i.e., when fewer than 5 examples per class are present, is essential for scaling deep learning models in biomedical applications where large-scale data collection and annotation can be expensive or not possible (in case of novel or unknown infectious agents). In this study, we introduce ProtoKD, one of the first approaches to tackle the problem of multi-class parasitic ova recognition using extremely scarce data. Combining the principles of prototypical networks and self-distillation, we can learn robust representations from only one sample per class. Furthermore, we establish a new benchmark to drive research in this critical direction and validate that the proposed ProtoKD framework achieves state-of-the-art performance. Additionally, we evaluate the framework's generalizability to other downstream tasks by assessing its performance on a large-scale taxonomic profiling task based on metagenomes sequenced from real-world clinical data.
more » « less
Full Text Available
Leveraging Symbolic Knowledge Bases for Commonsense Natural Language Inference using Pattern Theory

https://doi.org/10.1109/TPAMI.2023.3287837

Aakur, Sathyanarayanan N.; Sarkar, Sudeep (June 2023, IEEE Transactions on Pattern Analysis and Machine Intelligence)

The commonsense natural language inference (CNLI) tasks aim to select the most likely follow-up statement to a contextual description of ordinary, everyday events and facts. Current approaches to transfer learning of CNLI models across tasks require many labeled data from the new task. This paper presents a way to reduce this need for additional annotated training data from the new task by leveraging symbolic knowledge bases, such as ConceptNet. We formulate a teacher-student framework for mixed symbolic-neural reasoning, with the large-scale symbolic knowledge base serving as the teacher and a trained CNLI model as the student. This hybrid distillation process involves two steps. The first step is a symbolic reasoning process. Given a collection of unlabeled data, we use an abductive reasoning framework based on Grenander's pattern theory to create weakly labeled data. Pattern theory is an energy-based graphical probabilistic framework for reasoning among random variables with varying dependency structures. In the second step, the weakly labeled data, along with a fraction of the labeled data, is used to transfer-learn the CNLI model into the new task. The goal is to reduce the fraction of labeled data required. We demonstrate the efficacy of our approach by using three publicly available datasets (OpenBookQA, SWAG, and HellaSWAG) and evaluating three CNLI models (BERT, LSTM, and ESIM) that represent different tasks. We show that, on average, we achieve 63% of the top performance of a fully supervised BERT model with no labeled data. With only 1000 labeled samples, we can improve this performance to 72%. Interestingly, without training, the teacher mechanism itself has significant inference power. The pattern theory framework achieves 32.7% accuracy on OpenBookQA, outperforming transformer-based models such as GPT (26.6%), GPT-2 (30.2%), and BERT (27.1%) by a significant margin. We demonstrate that the framework can be generalized to successfully train neural CNLI models using knowledge distillation under unsupervised and semi-supervised learning settings. Our results show that it outperforms all unsupervised and weakly supervised baselines and some early supervised approaches, while offering competitive performance with fully supervised baselines. Additionally, we show that the abductive learning framework can be adapted for other downstream tasks, such as unsupervised semantic textual similarity, unsupervised sentiment classification, and zero-shot text classification, without significant modification to the framework. Finally, user studies show that the generated interpretations enhance its explainability by providing key insights into its reasoning mechanism.
more » « less
Full Text Available
IS-GGT: Iterative Scene Graph Generation with Generative Transformers

https://doi.org/10.1109/CVPR52729.2023.00609

Kundu, Sanjoy; Aakur, Sathyanarayanan N. (June 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

Scene graphs provide a rich, structured representation of a scene by encoding the entities (objects) and their spatial relationships in a graphical format. This representation has proven useful in several tasks, such as question answering, captioning, and even object detection, to name a few. Current approaches take a generation-by-classification approach where the scene graph is generated through labeling of all possible edges between objects in a scene, which adds computational overhead to the approach. This work introduces a generative transformer-based approach to generating scene graphs beyond link prediction. Using two transformer-based components, we first sample a possible scene graph structure from detected objects and their visual features. We then perform predicate classification on the sampled edges to generate the final scene graph. This approach allows us to efficiently generate scene graphs from images with minimal inference overhead. Extensive experiments on the Visual Genome dataset demonstrate the efficiency of the proposed approach. Without bells and whistles, we obtain, on average, 20.7% mean recall (mR@100) across different settings for scene graph generation (SGG), outperforming state-of-the-art SGG approaches while offering competitive performance to unbiased SGG approaches.
more » « less
Full Text Available

« Prev Next »

Search for: All records